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1. INTRODUCTION 

Noise pollution generates excessive noise that causes annoyance to living organisms. This noise is 
considered one of the critical environmental problems because it endangers the health of the population; 
causing affectations such as cardiovascular diseases, hearing problems, sleep disorders and adverse social 
behaviour [1], [2], in addition to this, [3] found that this type of pollution influences the incidence and severity 
of COVID-19, because it generates high levels of cortisol, weakening the immune system. Vehicle traffic is 
one of the main sources of this pollution in cities [4], [5]. In general, there are two sets of factors that influence 
noise annoyance: i) related to the physical characteristics of the sound (type of noise, level, duration, and 
frequency spectrum), the time of day it occurs and the exposure and ii) related to the individual, including 
physiological, psychological, and social characteristics that affect the subjective perception of noise [5], [6]. 

Vehicles are predominantly sources of low and medium frequency noise, which has a high penetrating 
power and propagates with low dissipative absorption over long distances. The continuous growth of the car 
fleet has progressively increased the need for special attention to urban traffic noise, which not only increases 
in line with the growth of residential, industrial and commercial areas, but also causes adverse impact of noise 
emissions on people [7], [8]. Predicting the level of noise produced by urban transport is an essential aspect of 
mitigating environmental pollution. Therefore, it is necessary to have appropriate and specific mathematical 
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tools (models) that can reproduce or simulate different acoustic scenarios for use in assessing and planning 
urban planning activities [9], [10]. Artificial neural networks (ANNs) have recently emerged as an important 
area of research, not only for their general ability to process noise data, but also to learn and store it, specifically 
this computational model allows prediction and optimisation of traffic noise descriptors [4], [11], [12]. 

Studies have proposed method for assessing and predicting noise in the environment. These prediction 
methods are mainly classified into three groups: physical propagation models, traditional statistical methods 
and machine learning methods [13], [14]. Deep learning derives from the study of ANNs, which in many areas 
of data science have demonstrated a remarkable ability to learn complex, non-linear relationships between sets 
of variables [15], [16]. The ANN is inspired by the nature of real dynamic systems emulating the human brain, 
where neurons are layered and interconnected by mathematical functions. Each neuron receives a weighted 
signal from the previous layer, which is processed to learn from the examples provided by training algorithms 
[17]. Common neural network models include multilayer perceptron (MLP), convolutional neural networks 
(CNN), and recurrent neural networks (RNN) [18]. These algorithms iteratively update the model parameters 
until the error between the actual value of the output variable and the experimental one is minimised 
[4], [13], [19], [20]. 

The research proposes the use of a MLP ANN as a model for predicting urban environmental noise in 
the city of Jaén. It was proposed to use the MLP network type for its versatility and prediction capacity. The 
knowdlege discovery in databases (KDD) method was used for the development of the network, as a sequence 
of ordered steps that allowed accurate information to be obtained. The modelling was carried out in Weka 
software, introducing data obtained from environmental monitoring of research carried out in the same city, 
with authors between 2016-2020; for training and validation we opted for the division of the data (80-20%) in 
order to avoid the complexity of the model and therefore an overfitting of the model (overfitting). 


2. METHOD 

Research on sound evaluation in the urban area of Jaén for the period 2016-2020 was taken into 
account. The databases were reviewed from institutional repositories of Peruvian universities and were 
subjected to evaluation considering standardisation criteria in the information considered by the authors. We 
opted for those data sources that consider the sound pressure level of the vehicle fleet and that consider the 
variables (inclusion and exclusion criteria): name of road, location coordinates of sampling points, time and 
date of data collection, maximum sound pressure level (Lmax), minimum sound pressure level (Lmin), number 
of motokar per time unit and sampling point, number of linear moto per time unit and sampling point, number 
of cars per time unit and sampling point, equivalent continuous sound pressure level (LAeqT). The ANNs were 
developed using the KDD method, in the free software Weka and with the backpropagation learning algorithm 
(80% of the data for training and 20% for validation). 

To select independent variables influencing the equivalent continuous sound pressure level, the 
CorrelationAtributeEval evaluator attribute was used, prioritising input variables with significance values 
greater than or equal to 0.1. The correlation coefficient (R), the coefficient of determination (R°) and the root 
mean square error (RMSE) are selected as criteria for evaluating model performance. To visualise the 
relationship between the actual and the protonistic sound pressure level (LAeqT), the variables used in Weka 
were simulated in SPSS based on linear regression. Figure 1 (in Appendix) shows the methodological flow for 
the development of the proposed neural network. 


3. RESULTS AND DISCUSSION 
3.1. Collection of noise pollution data from the urban area of Jaen 

The institutional repositories of the country's universities were consulted during the period 
2016-2020. Five data sources were preliminarily evaluated (Table 1), of which two met the inclusion and 
exclusion criteria previously detailed: i) sound pressure level by the vehicle fleet in the city of Jaén, from 
December 2018 to February 2019 (T1) and ii) assessment of vehicular noise pollution based on the supreme 
decree N°085-2003-PCM regulation of environmental quality standards for noise carried out in the province 
of Jaen, Department of Cajamarca, 2016 (T2). It was verified that the data collected and considered in the study 
were obtained by sound level meters calibrated by the National Institute of Quality INACAL) of Peru. 


3.2. Artificial neural network for estimating noise pollution in the urban area of Jaen 
3.2.1. Attribute selection for ANN, T1 and T2 

The data obtained for both T1 and T2 were divided into 80% for training and 20% for validation. The 
CorrelationAttributeEval attribute evaluator and the ranker search method in Weka software were used with 
80% of the data for both T1 and T2, obtaining the importance values for each input variable. 
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Tables 2 and 3 show the nine importance values for each variable evaluated for T1 and T2, however, in order 


to propose the number of input variables, the values of importance <0.1 were chosen, reducing the input 
variables for the models to six. 


Table 1. Sources of literature review 


Name of research Author University 

Evaluation of sound pressure levels in commercial establishments in the Gianela Olivera Zurita National 

urban area of the city of Jaen, based on supreme decree N°085-2003-PCM Kiara Belkiss Silva Vega University of Jaen 

Environmental Quality Standards (ECAS) for noise in the main university Felipe Nery Silva Cabrera National 

higher education centres in the city of Jaen University of Jaen 

Sound pressure levels in the markets of the city of Jaen, Cajamarca -2019 Katiri Tatiana Estela Carranza National 
Jefferson Jair Goicochea Pérez University of Jaen 

Evaluation of vehicular noise pollution based on the Supreme Decree N°085- Cintia Karely Cruzado National 

2003-PCM Regulation of Environmental Quality Standards for Noise carried Ancajima University of Jaen 

out in the province of Jaén, department of Cajamarca, 2016 Yanira Susana Soto Medina 

Sound pressure level by the vehicle fleet in the city of Jaén, December 2018 Elser Burga Mendoza National 

to February 2019 University of Jaen 


Table 2. T1, 80% attribute selection 


Variable Importance values 

Lmax 0.981 

Lmin 0.7142 
CoordenadaUTM 0.2799 
NombredelaVia 0.2799 
Hora 0.155 

Motokar 0.1481 
Carros 0.0868 
MotoLineal 0.0536 
Fecha -0.0327 


Table 3. T2, 80%, attribute selection 


Variable Importance values 

Lmax 0.8943 
Lmin 0.3823 
MotoLineal 0.1882 
Motokar 0.1818 
CoordenadasUTM 0.1543 
NombredelaVia 0.1475 
Carros 0.0788 
Hora 0.0655 
Fecha 0.051 


3.2.2. Training of ANNs 

For T1, a 6-19-1 architecture ANN was obtained; six (6) input neurons, a hidden layer with nineteen 
(19) nodes and an output layer with the dependent variable LAeqT, Figure 2(a); on the contrary, for T2, a 6- 
15-1 architecture ANN was obtained; six (6) input neurons, a hidden layer with fifteen (15) nodes and an output 
layer with the dependent variable LAeqT, Figure 2(b). The fit statistics used for the training set of ANN-T1 
and ANN-T2 were R, R? and RMSE (Table 4); acceptable values were obtained for both models. For T1 and 
T2 the R and R? presented values close to unity showing a good performance of the model; unlike the values 
obtained for the RMSE, which showed values close to unity and therefore a higher error rate, this due to the 
rate of learning in the training process (high learning rates will make the training converge faster, but the fit of 
the trainable parameters will be less accurate and will result in higher error rates) [21]. The results of the 
training of the networks are shown in Figures 3 and 4, where the results of the dispersion of predicted LAeqt 
values and actual LAeqt are graphically represented. The relationship between the observed LAeqt and the 
estimated LAeqt, for model T1, presents a line of fit of equation 
y=2.48+1.01 x, with coefficient of determination R?=0.98, on the contrary, model T2 presents a line of fit of 
equation y=13.05+0.83 x, with R7=0.92. 


3.3. Validation of the artificial neural network 


For the validation of the ANNs, the same adjustment statistics used in the training stage were used, as 
shown in Table 5. However, in this case 20% of the data from each investigation was used. For both models’ 
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good values of R and R2 were obtained due to their closeness to unity; differing in the RMSE, as T2 presented 
a more positive value compared to T1, but for both models’ acceptable values were obtained. 
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Figure 2. Validation of the ANNs (a) architecture of T1-TNA and (b) architecture of T2-TNNA 


Table 4. Values of statistics used in RNA-T1 and RNA-T2 training 
Statistic RNA-Tl RNA-T2 
R 0.9861 0.9606 
R? 0.9723 0.9227 
RMSE 0.7952 0.99 
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Figure 3. Predicted LAeqT vs. actual LAeqT of T1 


LAeqTA Pronosticado vs. por LAeqT Real 


90,00 


Y= 13.05 + 0.83 


s 
+ 


LAeqTA Pronosticado 


700 720 740 760 780 800 820 uo 


LAeqT Real 


Figure 4. Predicted LAeqT vs. actual LAeqT of T2 


Table 5. Values of statistics used in validation of RNA-T1 and RNA-T2 
Statistic RNA-Tl | RNA-T2 
R 0.9927 0.9989 
R? 0.9854 0.9978 
RMSE 0.7313 0.1515 


The performance of the neural network was obtained through the R, R? and RMSE statistics. The 
networks obtained show good performance in both the training and validation stages, giving values close to 
unity, which demonstrates a positive relationship between the data obtained by the network and the data 
provided. It should be noted that the validation values were more significant; obtaining R=0.9927 and 
R’=0.9854, RMSE=0.7313 at T1 and R=0.9989 and R?7=0.9978, RMSE=0.1515 at T2. If these results are 
compared with those obtained by different authors such as Mansourkhaki et al. [22] who obtained an R=0.992 
and R7=0.983, RMSE=0.1515 at T2 and the case of Sequeira et al. [23] who obtained 
R=0.995, R?=0.991, and RMSE=0.44; it can be said that the networks obtained for both T1 and T2 show high 
efficiency for noise prediction. It can also be affirmed that MLP ANNs for both T1 and T2 show high efficiency 
for the prediction of the equivalent continuous sound pressure level (LAeq); these results are analogous to 
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those obtained by Genaro García where he compared the efficiency of an ANN with mathematical models 
(RSLS 90 and technical guidelines for noise impact assessment (Criterion)). Obtaining that the ANN has a 
better performance and by Chen et al. [24] comparing two types of neural networks, including the MLP 
network and the radial basis function (RBF) network for predicting traffic noise, where it was shown that the 
MLP network performed better than the RBF network in predicting noise level [25]. 


4. CONCLUSION 

An ANN model was developed for the estimation of the equivalent continuous sound pressure level 
(LAeqT) using the MLP etwork type and algorithm, using the variables that contributed most to the estimation 
of the dependent variable during the training stage. For T1-Burga Mendoza they were: name of road, UTM 
coordinates, time, Lmax, Lmin, and motokar, for T2-Cruzado Ancajima and Soto Medina they were: name of 
road, UTM coordinates, Lmax, Lmin, Motokar, and Moto linear. A structure of 6-19-1 was obtained for T1- 
Burga Mendoza and 6-15-1 for T2-Cruzado Ancajima and Soto Medina. The validation results of the work 
show that the network created for T1 is capable of estimating the sound pressure level with R=0.9927 and 
R7=0.9854 and for T2 with R=0.9989 and R7=0.9978. 
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